home *** CD-ROM | disk | FTP | other *** search
- README: WAIS Unix release 8 b3 Fri Sep 13 1991
- Brewster Kahle Thinking Machines Corp
- Brewster@think.com
- ------------------------------------------------------------------------
-
- WARRANTY DISCLAIMER
-
- This software was created by Thinking Machines Corporation and is distributed
- free of charge. It is placed in the public domain and permission is granted
- to anyone to use, duplicate, modify and redistribute it provided that this
- notice is attached.
-
- Thinking Machines Corporation provides absolutely NO WARRANTY OF ANY KIND with
- respect to this software. The entire risk as to the quality and performance
- of this software is with the user. IN NO EVENT WILL THINKING MACHINES
- CORPORATION BE LIABLE TO ANYONE FOR ANY DAMAGES ARISING OUT THE USE OF THIS
- SOFTWARE, INCLUDING, WITHOUT LIMITATION, DAMAGES RESULTING FROM LOST DATA OR
- LOST PROFITS, OR FOR ANY SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES.
-
-
- This directory contains software for creating a sample server, protocol
- software, and a low level user interface building tool. See the readme
- file in the next directory up for details on how use this software. This
- file will describe the overall structure of this directory.
-
- To index files, a set of files called ir*.[ch] are used. The file
- ircfiles.c allows for customization to new file formats and may want to be
- added to by sophisticated users. irbuild.c contains the main program for
- this.
-
- To create a server, the file server.c is used.
-
- To make a server that does not use the inverted files (because you have
- your own database of some sort), change the ir.c file. This is not as easy
- to do as we would like it to be.
-
- The file ui.c is a low level tool for creating user interfaces. The
- directory ../ui is built on top of that file.
-
- The protocol software is broken into z39.50 files (z*) and wais protocol
- extension files (w*). These files will be changed dramatically as the
- standards committee makes changes. We will attempt to keep the ui.c
- interface the same.
-
- The rough flow of a question from user to server is shown below. These are
- not strictly levels of calling, rather, many of these files are tools that
- are called from the higher levels.
-
- ../ui/shell-ui.c /* a user interface, could be ../x, any other */
- ui.c /* user interface toolkit */
- wprot.c + wutil.c /* wais protocol */
- zprot.c + zutil.c /* Z39.50 protocol */
- transport.c /* ready the message for transport */
- sockets.c /* use sockets as a transport */
- transport.c
- zprot.c + zutil.c
- wprot.c + wutil.c
- ir.c /* server toolkit file */
- irsearch.c /* searching the inverted file indices */
-
- If you want to make a new type of server look in ir.c;
- if you want to make a new user interface, look in ui.c.
-
-
- The Indexer
-
- This is a very simple inverted file system with primitive word weighting.
-
- What is created is an inverted index of all the words over 2 characters
- long in the input text. Words are truncated to 20 characters. Each word
- has weight associated with it; the first time it occurs in a document it
- gets an extra weight of 5, and from then on, it gets a weight of 1 in that
- document. Words in the headline are worth an extra 10.
- Searching is done by finding the documents with the most word weights for
- each word in the query. Relevant document's words are worth 10 times less.
-
- For indexing files the structure is:
- irbuild.c /* top level main */
- irtfiles.c /* parses the file */
- (using ircfiles.c) /* rules for parsing files into documents */
- irfiles.c /* file structures except inverted file */
- irext.h /* defines interface with particular backends */
- irhash.c /* add_word definition for inverted file */
- irinv.c /* inverted file definition */
-
-